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Abstract 

We consider the problem of block-coded communication, where in each block, the channel law belongs to one 
of two disjoint sets. The decoder is aimed to decode only messages that have undergone a channel from one of the 
sets, and thus has to detect the set which contains the prevailing channel. We begin with the simplified case where 
each of the sets is a singleton. For any given code, we derive the optimum detection/decoding rule in the sense 
of the best trade-off among the probabilities of decoding error, false alarm, and misdetection, and also introduce 
sub-optimal detection/decoding rules which are simpler to implement. Then, various achievable bounds on the error 
exponents are derived, including the exact single-letter characterization of the random coding exponents for the 
optimal detector/decoder. We then extend the random coding analysis to general sets of channels, and show that 
there exists a universal detector/decoder which performs asymptotically as well as the optimal detector/decoder, 
when tuned to detect a channel from a specific pair of channels. The case of a pair of binary symmetric channels 
is discussed in detail. 


Index Terms 

Joint detection/decoding, error exponent, false alarm, misdetection, random coding, expurgation, mismatch 
detection, detection complexity, universal detection. 

I. Introduction 

Consider communicating over a channel, for which the prevailing channel law Py\x (2f and Y being the channel 
input and output, respectively) is supposed to belong to a family of channels W. For example, W could be a 
singleton W = {W}, or some ball centered at W with respect to (w.r.t.) a given metric (say, total variation). This 
ball represents some uncertainty regarding the channel, which may result, e.g., from estimation errors. The receiver 
would also like to examine an alternative hypothesis, in which the channel Py\x is not in W, and belongs to a 
different set V, disjoint from >V. Such a detection procedure will be useful, for example, in the following cases: 

1) Time-varying channels: In many protocols, communication begins with a channel estimation phase, and later 
on, at the data transmission phase, the channel characteristics are tracked using adaptive algorithms HI 
Chapters 8 and 9]. However, it is common, that apart from its slow variation, the channel may occasionally also 
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change abruptly, for some reason. Then, the tracking mechanism totally fails, and it is necessary to initialize 
communication again with a channel estimation phase. The detection of this event is usually performed at high 
communication layers, e.g., by inspecting the output data bits of the decoder, and verifying their correctness 
in some way. This procedure could be aided, or even replaced, by identifying a distinct change in the channel 
as part of the decoding. Note that this problem is a block-wise version of the change-point detection problem 
from sequential analysis ||2l, IS (see, also iH and referenced therein for a recent related work). 

2) Arbitrarily varying channels in blocks'. In the same spirit, consider a bursty communication system, where 
within each burst, the underlying channel may belong to either one of two sets, resulting from two very 
distinctive physical conditions. For example, a wireless communication signal may occasionally be blocked 
by some large obstacle which results in low channel gain compared to the case of free-space propagation, 
or it may experience strong interference from other users 151. The receiver should then decide if the current 
channel enables reliable decoding. 

3) Secure decoding'. In channels that are vulnerable to intrusions, the receiver would like to verify that an 
authorized transmitter has sent the message. In these cases, the channel behavior could serve as a proxy for 
the identity of the transmitter. For example, a channel with a significantly lower or larger signal-to-noise 
ratio (SNR) than predicted by the geographical distance between the transmitter and receiver, could indicate 
a possible attempt to intrude the system. The importance of identifying such cases is obvious, e.g., if the 
messages are used to control a sensitive equipment at the receiver side. 

4) Multiple access channels with no collisions'. Consider a slotted sparse multiple access channel, for which two 
transmitters are sending messages to a common receiver only in a very small portion of the available slotj^, 
via different channels. Thus, it may be assumed that at each slot, at most one transmitter is active. The receiver 
would like to identify the sender with high reliability. As might be dictated by practical considerations, the 
same codebook is used by both transmitters and the receiver identifies fhe fransmifter via a short header, 
which is common to all codewords of the same transmitter^ The receiver usually identifies the transmitter 
based on the received header only. Of course, this header is an undesired overhead, and so it is important to 
maximize the detection performance for any given header. To this end, the receiver can also use the codeword 
sent, and identify the transmitter using the different channel. 

Thus, beyond the ordinary task of decoding the message, the receiver would also like to detect the event Py\x £ 
or, in other words, perform hypothesis testing between the null hypothesis Py\x S FV and the alternative hypothesis 
Py\x S V. For example, if the channel quality is gauged by a single parameter, say, the crossover probability of a 
binary symmetric channel (BSC), or the SNR of an additive white Gaussian noise channel (AWGN), then >V and 
V could be two disjoint intervals of this parameter. 

'For simplicity, assume that each codeword occupies exactly a single slot. 

^Also, if senders simply use different codebooks, then the detection performance would be related to the error probability of the codebook 
which is comprised from joining the two codebooks. The random coding exponents for the case that the codebook of each transmitter is 
chosen independently from the codebook of the other user can be obtained by slightly modifying the results of (6). 
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This problem of joint detection/decoding belongs to a larger class of hypothesis testing problems, in which after 
performing the test, another task should be performed, depending on the chosen hypothesis. For example, in Q, 
m, the problem of joint hypothesis testing and Bayesian estimation was considered, and in ||9l the subsequent task 
is lossless source coding. A common theme for all the problems in this class, is that separately optimizing the 
detection and the task is sub-optimal, and so, joint optimization is beneficial. 

In a more recent work ifTOl . we have studied the related problem of joint detection and decoding for sparse com¬ 
munication Ifm . which is motivated by strongly asynchronous channels ifT^ . |[T3l . In these channels the transmitter 
is either completely silent or transmits a codeword from a given codebook. The task of the detector/decoder is to 
decide whether transmission has taken place, and if so, to decode the message. Three figures of merit were defined 
in order to judge performance: (i) the probability of false alarm (FA) - i.e., deciding that a message has been sent 
when actually, the transmitter was silent and the channel output was pure noise, (ii) the probability of misdetection 
(MD) - that is, deciding that the transmitter was silent when it actually transmitted some message, and (iii) the 
probability of inclusive error (IE) - namely, not deciding on the correct message sent, namely, either misdetection 
of erroneous decoding. We have then found the optimum detector/decoder that minimizes the IE probability subject 
to given constraints on the EA and the MD probabilities for a given codebook, and also provided single-letter 
expressions for the exact random coding exponents. While this is a joint detector/decoder, we have also observed 
that an asymptotic separation principle holds, in the following sense: A detector/decoder which achieves the optimal 
exponents may be comprised of an optimal detector in the Neyman-Pearson sense for the EA and MD probabilities, 
followed by ordinary maximum likelihood (ME) decoding. 

In this paper, we study the problem of joint channel detection between two disjoint sets of memoryless channels 
W, V, and decoding. We mainly consider discrete alphabets, but some of the results are easily adapted to continuous 
alphabets. We begin by considering the case of simple hypotheses, namely W = {W} and V = {E}. As in 
ifTOll . we measure the performance of the detector/decoder by its EA, MD and IE probabilities, derive the optimal 
detector/decoder, and show that here too, an asymptotic separation principle holds. Due to the numerical instability 
of the optimal detector, we also propose two simplified detecfors, each of which suits better a different rate range. 
Then, we discuss a plethora of lower bounds on the achievable exponents: Eor the optimal detector/decoder, we 
derive single-letter expressions for the exact random coding exponents, as well as expurgated bounds which improve 
the bounds at low rates. The exact random coding exponents are also derived for the simplified defecfors/decoders. 
In addition, we also derive Gallager/Eorney-style random coding and expurgated bounds, which are simpler to 
compute, and can be directly adapted to continuous channels. However, as we show in a numerical example, 
the Gallager/Eorney-style exponents may be strictly loose when compared to the exact exponents, even in simple 
cases. Thus, using the refined analysis technique which is based on fype class enumeration (see, e.g., ifldll . |[T5l 
and references therein) and provides the exact random coding exponents is beneficial in this case. Afterwards, we 
discuss a generalization to composite hypotheses, i.e., W, V that are not singletons. Einally, we discuss in detail 
the archetype example for which W, V are a pair BSCs. 


4 


The detection problem addressed in lITOll can be seen to be a special case of the problem studied here, for which 
the the output of the channel V is completely independent of its input, and plays the role of noise. It turns out that 
the optimal detector/decoder and its properties for the problem studied here are straightforward generalizations of 
m, and thus we will discuss them rather briefly and only cite the relevant results from ifTOl . However, there is a 
substantial difference in the analysis of the random coding detection exponents in ifTOl . compared to the analysis 
here. In ifTOl . the discrimination is between the codebook and noise. The detector compares a likelihood which 
depends on the codebook with a likelihood function that depends on the noise. So, when analyzing the performance 
of random coding, the random choice of codebook only affects the distribution of the likelihood of the ‘codebook 
hypothesis’. By contrast, here, since we would like to detect the channel, the random choice of codebook affects the 
likelihood of both hypotheses, and consequently, the two hypotheses may be highly dependent. One consequence 
of this situation, is that to derive the random coding exponents, it is required to analyze the joint distribution of 
type class enumerators (cf. Subsection IV-AL and not just rely on their marginal distributions. The expurgated and 
Gallager/Fomey-style exponents, as well as the simplified detectors/decoders are studied here for the first time. 

The outline of the rest of the paper is as follows. In Section JIl we establish notation conventions and provide 
some preliminaries, and in Section |IIIJ we formulate the problem of detecting between two channels. In Section 
HVl we derive the optimum detector/decoder and discuss some of its properties, and also introduce sub-optimal 
detectors/decoders. In Section |Vl we present our main results regarding various single-letter achievable exponents. 
In Section |Vll we discuss the problem of detection of composite hypotheses. Finally, in Section IVIIl we exemplify 
the results for a pair of BSCs. We defer most of the proofs to the appendices. 

II. Notation Conventions and Preliminaries 

Throughout the paper, random variables will be denoted by capital letters, specific values fhey may take will 
be denoted by the corresponding lower case letters, and their alphabets, similarly as other sets, will be denoted 
by calligraphic letters. Random vectors and their realizations will be denoted, respectively, by capital letters and 
the corresponding lower case letters, both in the bold face font. Their alphabets will be superscripted by their 
dimensions. For example, the random vector X = (Xi,... ,X„), (n - positive integer) may take a specific vector 
value X = (xi,..., Xn) in fhe n-fh order Carfesian power of X, which is fhe alphabef of each component of 
this vector. 

A joint distribution of a pair of random variables (X, Y) on X x y, the Cartesian product alphabet of X and 3^, 
will be denoted by Qxy and similar forms, e.g. Qxy- Since usually Qxy will represent a joint distribution of X 
and Y, we will abbreviate this notation by omitting the subscript XY, and denote, e.g, Qxy by Q- The X-marginal 
(F-marginal), induced by Q will be denoted by Qx (respectively, Qy), and the conditional distributions will be 
denoted by Qyjx and Qx\y- Iri accordance with this notation, the joint distribution induced by Qx and Qy\x will 
be denoted by Q = Qx x Qy\x- 

For a given vector x, let Qx denote the empirical distribution, that is, the vector {Qx(3;)) x € X}, where Qx{x) 






5 


is the relative frequency of the letter x in the vector x. Let T{Px) denote the type clas^ associated with Px, that 
is, the set of all sequences {x} for which (jx = Px- Similarly, for a pair of vectors (x, y), the empirical joint 
distribution will he denoted hy Qxy- 

The mutual information of a joint distribution Q will be denoted by I{Q), where Q may also be an empirical joint 
distribution. The information divergence between Qx and Px will be denoted by D{Qx\\Px), and the conditional 
information divergence between the empirical conditional distribution Qyjx and Py\x^ averaged over Qx, will be 
denoted by D{Qy\x\\Py\x\Qx)- Here too, the distributions may be empirical. 

The probability of an event A will be denoted by P{^}, and the expectation operator will be denoted by !£{•}. 
Whenever there is room for ambiguity, the underlying probability distribution Q will appear as a subscript, i.e., 
Pq{-} and ]Eq{'}. The indicator function will be denoted by !{•}. Sets will normally be denoted by calligraphic 
letters. The complement of a set A will be denoted by A. Logarithms and exponents will be understood to be 
taken to the natural base. The notation will stand for max{f, 0}. We adopt the standard convention that when 
a minimization (respectively, maximization) problem is performed on an empty set the result is oo (respectively, 
—oo). 

For two positive sequences, {un} and {bn}, the notation an = bn will mean asymptotic equivalence in the 
exponential scale, that is, lim^^oo ^ log(f^) = 0, and similar standard notations < and > will also be used. When 
an is a sequence of conditional probabilities, i.e, an = ^ {An\13n) for some pair of sequence of events {An}^=i 
and {}3n}{{Li, notation F{An\I3n) = bn will mean 


lim — log 

/—>-oo Tl; 



= 0 , 


( 1 ) 


where {ni}f^^ is the sequence of blocklengths such that P(Sn,) > 0. We shall use the notation an = when 

an decays super-exponentially to zero. 

Throughout the sequel, we will make a frequent use of the fact that ®n(*) = niaxi<j<fc„ an{i) as long as 
{an(^)} are positive and kn = 1- Accordingly, for kn sequences of positive random variables {An{i)}, all defined 
on a common probability space, and a deterministic sequence bn, 

pj max An{i)>bn\ (2) 

k-n 

PU{^n(f)>6n} (3) 

i=\ 


'y ^ An{i^ P bn 


. *=1 


kn 

= y^P{A„(i) > bn) 

i=l 

= max F{An{i) > bn} , 


(4) 

(5) 


^The blocklength will not be displayed since it will be understood from the context. 
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provided that b'^ = bn implies P{A„(i) > 6^} = P{A„(i) > 6„}oIn simple words, summations and maximizations 
are equivalent and can be both “pulled out outside” P{-} without changing the exponential order, as long as = 1. 
The equalities in (jS) will be termed henceforth ‘the union rule" (UR). By the same token. 




, i=l 


= P< max Anii) < b^, 


( 6 ) 


fcn 

= Ff]{An{i) <bn}, (7) 

i=l 

and these equalities will be termed henceforth ‘the intersection rule" (IR). 

The natural candidate for kn is the number of joint types possible for a given block length n, and this fact, along 
with all other rules of the method of types ifT^ will be used extensively henceforth, without explicit reference. 


III. Problem Formulation 

Consider a discrete memoryless channel (DMC), characterized by a finite input alphabet X, a finite output alphabet 
3^, and a given matrix of single-letter transition probabilities {PY\x{y\x)}x£X,yC:y- Let = {xi,X 2 ... ,xm} C 
denote a codebook for blocklength n and rate R, for which the transmitted codeword is chosen with a 
uniform probability distribution over the M = codewords. The conditional distribution Py\x rnay either 

satisfy Py\x = ^ (the null hypothesis), or Py\x = ^ (the alternative hypothesis). It is required to design 
a detector/decoder which is oriented to decode messages only arriving via the channel W. Formally, such a 
detector/decoder 0 is a partition of into M+1 regions, denoted by {77^1^=00 y ^ some 1 < m < M 

then the m-th message is decoded. If y G TZq (the rejection region) then the channel V is identified, and no decoding 
takes place. 

For a codebook Cn and a given detector/decoder f, the probability of false alarm (FA) is given by 

1 “ 

PvAiCn, = W{no\^m), ( 8 ) 

m=l 

the probability of misdetection (MD) is given by 

1 ^ _ 

PuACn, = ^(^0|X„^), (9) 

m=l 

and the probability of inclusive error (IE) is defined as 

1 ^ _ 

7jE(Cn, = {Uml^m) ■ (10) 

m=l 

Thus, the IE event is the total error event, namely, when the correct codeword x^ is not decoded either because 

‘'Consider the case where b„ = (b being a constant, independent of n) and the exponent of P{T„(i) > is a continuous function 
of b. 

^The decoder (j) naturally depends on the blocklength via the codebook Cn, but this will be omitted. 
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of a FA or an ordinary erroneous decodingO The probability of decoding to an erroneous codeword, excluding the 
rejection region, is termed the exclusive error (EE) probability and is defined as 


PEEiCn, 4>) — PmiCn, 4>) — P^p^iCn, 0 )- 


( 11 ) 


When obvious from context, we will omit the notation of the dependence of these probabilities on Cn and cj). 

Eor a given code Cn, we are interested in achievable trade-offs between Pp^, Pmd and Pip. Consider the following 
problem: 


minimize Pie 
subject to PpA < epA 
Pmd ^ Cmd 


( 12 ) 


where epA and Cmd are given prescribed quantities, and it is assumed that these two constraints are not contradictory. 
Indeed, there is some tension between and PpA as they are related via the Neyman-Pearson lemma ifT^ Theorem 
11.7.1]. Eor a given epA, the minimum achievable Pmd is positive, in general. It is assumed then that the prescribed 
value of eMD is not smaller than this minimum. In the problem under consideration, it makes sense to relax the 
tension between the two constraints to a certain extent, in order to allow some freedom to minimize Pip under these 
constraints. While this is true for any /in/te blocklength, as we shall see (Proposition |3), an asymptotic separation 
principle holds, and the optimal detector in terms of exponents has full tension between the PA and MD exponents. 
The optimal detector/decoder for the problem (fT^ will be denoted by (/>*. 

Remark 1. Naturally, one can use the detector/decoder cp* for messages sent via V. The detection performance for 
this detector/decoder would simply be obtained by exchanging the meaning of PA with MD. 

Our goal is to find the optimum detector/decoder for the problem (fT^ . and then analyze the achievable exponents 
associated with the resulting error probabilities. 

IV. Joint Detectors/Decoders 

In this section, we discuss the optimum detector/decoder for the problem (fT^ . and some of its properties. We 
will also derive an asymptotically optimal version, and discuss simplified decoders, whose performance is close to 
optimal in some regimes. 

A. The Optimum Detector/Decoder 

Pet a, 6 € M, and define the detector/decoder cp* = {'P*m\m= 0 '’ where: 

{ M M \ 

y: a-y^W (y|x^) + max W (y|x^) < b ■ V (y|x^) > , (13) 

m=l m=l ) 

^This definition is conventional in related problems. For example, in Forney’s error/erasure setting 03, one of the events defined and 
analyzed is the total error event, which is comprised of a union of an undetected error event and an erasure event. 




and 




y : maxTy(y|xm) > max Vl^(y|xfc) 

m k^m 


( 14 ) 


where ties are broken arbitrarily. 


Lemma 2. Let a codebook Cn be given, let (p* be as above, and let cp be any other partition of into M + 1 
regions. If P^t,{Cn,(p) < PmiCnA*) and PuD{Cn,(p) < PuniCnA*) then Pie(C„,0) > Pi^{Cn,(p*). 

Proof: The proof is almost identical to the proof of ifTOl Lemma 1] and thus omitted. ■ 

Note that this detector/decoder is optimal (in the Neyman-Pearson sense) for any given blocklength n and 
codebook Cn- Thus, upon a suitable choice of the coefficients a and b, its solves the problem (fT^ exactly. As 
common, to assess the achievable performance, we resort to large blocklength analysis of error exponents. For a 
given sequence of codes C = {Cn}pfLi ^ detector/decoder cp, the FA exponent is defined as 


.Efa {c,(p) = lim inf -- log PpA (Cn, f ), 

n—^oo Ti 


(15) 


and the MD exponent {C, (p) and the IE exponent (C, (p) are defined similarly. The asympfotic version of 

CH) is then stated as finding fhe detector/decoder which achieves the largest under constraints on E'fa and F^md- 

To affect these error exponents, the coefficients a, b in (fTSl) need to exponentially increase/decrease as a functions 

of n. Denoting a = and b = the rejection region of Lemma |2] becomes 

{ M M 'I 

y : ^ kF(y|x,„) + niaxlF(y|x,„) < ^ V{y\xm) I . ( 16 ) 

m=l m=l ) 

For a > 0, the ML term on the right-hand side (r.h.s.) of (O is negligible w.r.t. the left-hand side (l.h.s.), and the 
obtained rejection region is asymptotically equivalent to 


7?' — 

/■Cq — 


M M \ 

Y. W^(y|x^) < • 5] L(y|x^) I 

m=l m=l ) 


(17) 


which corresponds to an ordinary Neyman-Pearson test between the hypotheses that the channel is W or V. 
Thus, unlike the fixed blocklength case, asymptotically, we obtain a complete tension between the FA and MD 
probabilities. Also, comparing (fTTl ). and (fT^ . we may observe that the term maXmkF(y|xm) in TZq is added in 
favor of the alternative hypothesis W. So, in case of a tie in the ordinary Neyman-Pearson test (fTTl) . the optimal 
detector/decoder will actually decide in favor of W. 

As the next proposition shows, the above discussion implies that there is no loss in error exponents when using 
the detector/decoder f, whose rejection region is as in (fTTl) . and if y ^ TZq then ordinary ML decoding for W 
is used, as in (fT4l) . This implies an asymptotic separation principle between detection and decoding: the optimal 
detector can be used without considering the subsequent decoding, and the optimal decoder can be used without 
considering the preceding detection. As a result, asymptotically, there is only a single degree of freedom to control 
the exponents. Thus, when analyzing error exponents in Section |Vl we will assume that f is used, and since (fTTl) 


depends on the difference a — j5 only, we will set henceforth /3 = 0 for 0'. The parameter a will he used to control 
the trade-off between the FA and MD exponents, just as in ordinary hypothesis testing. 

Proposition 3. For any given sequence of codes C = {Cn}^=i, and given constraints on the FA and MD exponents, 
the detector/decoder f achieves the same IE exponent as 4>*. 

Proof: Assume that the coefficients a, /3 of f* (in (fT^ ) are tuned to satisfy constraints on the FA and MD 
exponents, say F^fa and F^md- Let us consider replacing f* hy f, with the same a, /3. Now, given that the mth 
codeword was transmitted, the conditional IE prohahility (fTOl) is the union of the FA event and the event 

I W{Y\xm) < maxiy(Y|xfc)| , (18) 

k^m J 

namely, an ordinary ML decoding error. The union hound then implies 

Pm(Cn, f) < Po(Cn) + PrxiCn, f) (19) 

where Po(Cn) is the ordinary decoding error prohahility, assuming the ML decoder tuned to W. As the union 
hound is asymptotically exponentially tight for a union of two events, then 

P,^{CnA*) = PoiCnA*) + PvxiCnAl ( 20 ) 

= max{Po (Cn,0*) jF’fa (Cn,0*)} , (21) 

or 

(C, f*) = min {Eo (C, f *), F;fa (C, f*)} . (22) 

Now, the ordinary decoding error prohahility is the same for f* and f and so the first term in (|2T]) is the same 
for both detectors/decoders. Also, given any constraint on the MD exponent, the detector defined by 72 .q achieves 
fhe maximal LA exponenf, and so 

Efa(C,(()*) <Efa(C, ,/.'). (23) 

In lighf of (1221) . fhis implies fhat f safisfies the MD and LA constraints, and at the same time, achieves an IE 
exponent at least as large as that oi f*. ■ 

The achievable exponent bounds will be proved by random coding over some ensemble of codes. Letting over-bar 
denote an average w.r.t. some ensemble, we will define fhe random coding exponents, as 

F;pa (0) = lim —log PpA (Cn,, 4>) , (24) 

1^00 ni 

where {ni}ffi is a sub-sequence of blocklengths. When we assume a fixed composition ensemble with distribution 
Px, this sub-sequence will simply be the blocklengths such that T{Px) is not empty, and when we will assume the 
independent identically distributed (i.i.d.) ensemble, all blocklengths are valid. To comply with definition ([T5] ). one 
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can obtain codes which are good for all sufficiently large blocklength by slightly modifying the input distribution. 
The MD exponent (</>) and the IE exponent (0) are defined similarly, where fhe fhree exponenfs share fhe 
same sequence of blocklengfhs. 

Now, if we provide random coding exponenfs for fhe FA, MD and ordinary decoding exponenfs, fhen fhe exisfence 
of a good sequence of codes can be easily shown. Indeed, Markov inequality implies fhaf 

P {PvxiCni , 4>) > exp [-m (E^fa (</>) - 5)]) < e“"'‘ 2 , (25) 

for all I sufficienlly large. Thus, wifh probabilify lending lo 1, fhe chosen codebook will have FA probabilify 
nof larger lhan exp [—n (E'fa {(p) — 5)]- As fhe same can be said on fhe MD probabilify and fhe ordinary error 
probabilify, fhen one can hnd a sequence of codebooks wifh simullaneously good FA, MD and ordinary decoding 
error probabililies, and from (l22l) . also good IF probability. For Ibis reason, henceforth we will only focus on fhe 
defecfion performance, namely fhe FA and MD exponenfs. The IE exponenf can be simply obtained by (1221) and 
the known bounds of ordinary decoding, namely: (i) the standard Csiszar and Kbrner random coding bounds ifT^ 
Theorem 10.2] (and its tightness ifT^ Problem lO.SdJ^ and the expurgated bound ifT^ Problem 10.18] for hxed 
composition ensembles, (ii) the random coding bound Theorem 5.6.2], and the expurgated bound 11211 Theorem 
5.7.1] for the ensemble of i.i.d. codes. 

Beyond the fact that cp' is slightly a simpler detector/decoder than (p*, it also enables to prove a very simple 
relation between its FA and MD exponents. For the next proposition, we will use the notation cp'^ and TZ'q ^ to 
explicitly denote their dependence on a. 

Proposition 4. For any ensemble of codes such that Sfa(Cj</'o) cind 'Jre continuous in a, the FA and 

MD exponents of <p'^ satisfy 

Efa{C, (p'a) = -Emd(C, (p'a) + «• (26) 


Proof: For typographical convenience, let us assume that the sub-sequence of blocklengths is simply N. The 
detector/decoder cp'^ is the one which minimizes the FA probability under an MD probability constraint. Considering 
g-na > Q as a positive Fagrange multiplier, it is readily seen that for any given code, cp'^ minimizes the following 
Fagrangian: 


L{Cn, a) ^ Pfa {Cn, (p) + e-”“PMD [Cn, <p) 

f M M _■ 


. M 

y K m=l 


m=l 


Hence, 


L{Cn, (p, a) > L{Cn, (p'a, «) = PfaCC^, fa) + 6 "'“PMD(Cn, fa), 


(27) 

(28) 


(29) 


^See also the extended version EH Appendix C], which provides a simple proof to the tightness of the random coding exponent of 
Slepian-Wolf coding (20). A very similar method can show the tightness of the random coding exponent of channel codes. 
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or, after taking limits 

lim --logL(Cn,(/),a) = min {^pa( 0), + a} • 

n—^oo 71 

< lim --logL(Cn,0^, a) 

n—)-oo 71 

= mm{Ej,/,{(p'^),EuD{(pa) + «} ■ 

Now, assume by contradiction that 

EfA{4'a) > -E'mdI'/’o) + O. 


(30) 

(31) 

(32) 


(33) 


Then, from continuity of the FA and MD exponents, one can expand TZ'^ ^ to some TZ'^ ^ with a < a and obtain a 
decoder for which 

-£'md(</’q) + q; < -Emd(C, + a = Epj^iC, 4>-^) < Epp^iC, 4>a)- (34) 

Thus, 

-^(Cn, a) > -^^(Cn, </>„,«) (35) 


which contradicts (l3^ . and so 


£’fa(C, < £’md(C, (p'a) + «• 


(36) 


Similarly, it can be shown that reversed strict inequality in (l3^ contradicts the optimality of and so (l26l) 
follows. ■ 

Remark 5. Consider the following related problem 


minimize Ppp 
subject to PpA < CpA 
-Pmd ^ Cmd 


(37) 


and let (j)** be the optimal detector/decoder for the problem (lT7l) . Now, as P^p = Ppp + PpA, it may be easily 
verified that when PpA = CpA for the optimal detector/decoder (f)* (of the problem (fT^ l. then (^* is also the optimal 
detector/decoder for the problem (iTTl ). However, when PpA < CpA for (f)*, then (j)** is different, since it easy to check 
that for the problem (iTTl) . the constraint PpA < epA for (jp* must be achieved with equality. To gain some intuition 
why (lT7l) is more complicated than (fT^ . see the discussion in ifTOl Section III]. 


B. Simplified Detectors/Decoders 

Unfortunately, the asymptotically optimal detector/decoder (fTTI) is very difficult to implement in its current form. 
The reason is that the computation of usually intractable, as it is the sum of exponentially 

many likelihood terms, where each likelihood term is exponentially small. This is in sharp contrast to ordinary 
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decoders, based on comparison of single likelihood terms which can be carried out in the logarithmic scale, rendering 
them numerically feasible. In a recent related work |[22l dealing with the optimal erasure/list decoder ifTTl . it was 
observed that a much simplified decoder is asymptotically optimal. For the detector/decoder discussed in this paper, 
this simplihcation of (fTTl) implies that the rejection region 

7^" 4 |y : maxiV(Q|y)e"^'^('3) < • maxN{Q\y)e^f''^^A , (38) 

^ Q Q j 

is asymptotically optimal, where the type class enumerators are defined as 


^(Q|y) = 


X G 


Qxy — QxY 


(39) 


While the above mentioned numerical problem does not arise in T^q, there is still room for additional simplihcation 
which signihcantly facilitates implementation, at the cost of degrading the performance, perhaps only slightly. For 
zero rate, the type class enumerators cannot increase exponentially, and so either A^(Q|y) = 0 or A^(Q|y) = 1. 
Thus, for low rates, we propose the use of a sub-optimal detector/decoder, which has the following rejection region 

^o,L=|y: • max W{y\xm)< max l/(y|xm)| . (40) 

I l<m<M l<m<M J 

We will denote the resulting detector/decoder by 0 l. In this context, this is a generalized likelihood ratio test 
E^ . in which the codeword is the ‘nuisance parameter’ for the detection problem. For high rates (close to the 
capacity of the channel), the output distribution -g Y^m=i ^(yl^m) of a ‘good’ code Il24ll tends to be close to a 
memoryless distribution W = {Px x W)y for some distribution Px- Thus, for high rates, a possible approximation 
is a sub-optimal detector/decoder, which has the following rejection region 


7^o,H = {y : e"“-VF(y) <l/(y)}, (41) 

where V = {Px x W)y- We will denote the resulting detector/decoder by (j)„. 

As was recently demonstrated in ll22l . while (/>l and are much simpler to implement than (f>', they have the 
potential to cause only slight loss in exponents compared to cp'. Since the random coding performance of (p„ is 
simply obtained by the standard analysis of hypothesis testing between two memoryless hypotheses (cf. Subsection 
IV-CI) . we will mainly focus on (pi^. 


V. Achievable Error Exponents 

In this section, we derive various achievable exponents for the joint detection/decoding problem (fT2]) . for a 
given pair of DMCs {W, E), at rate R. In Subsection IV-Al we derive the exact random coding performance of the 
asymptotically optimal detector/decoder cp'. In Subsection I V-B I we derive an improved bound for low rates using the 
expurgation technique. In Subsection IV-Cl we discuss the exponents achieved by the sub-optimal detectors/decoders 
0L and (p„. In Subsection IV-Dl we provide Gallager/Forney-style lower bounds on the exponents. While these 
bounds can be loose and only lead to inferior exponents when compared to Subsections IV-Al and I V-B I it is indeed 
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useful to derive them since: (i) they are simpler to compute, since they require solving at most two-dimensional 
optimization prohlem^l irrespective of the input/output alphabet sizes, (ii) the hounds are translated almost verbatim 
to memoryless channels with continuous input/output alphabets, like the AWGN channel. For brevity, in most cases 
the notation of the dependence on the problem parameters (i.e. R,Px,a,W,V) will be omitted, and will be 
reintroduced only when necessary. 


A. Exact Random Coding Exponents 

We begin with a sequence of definitions. Throughout, Q will represent the joint type of the true transmitted 
codeword and the output, and Q is some type of competing codewords. We denote the normalized log-likelihood 
ratio of a channel W by 

fw{Q)= ^ Q(x, y) log W{y\x), (42) 

xGX,yGy 

with the convention fwiQxy) = —cc if VF(y|x) = 0 . We define the set 

Qw — {Q ■ fw{Q) > -oo} (43) 


and for 76 ®, 


Now, define fhe sets 


s(Qy,7)- min . -^(<3) + [-a -/ty(<3) + 7 ]+• 

QgQw'- Qy=Qy 


— fw{Q) < -oi + fv{Q)'^ , 

J2 = [Q- s[QyJv{Q)) >i?}, 


(44) 


(45) 

(46) 


the exponent 


the sets 


Ea= min D{Qy\x\\W\Px), 


/Ci^{(Q,Q): Qy = Qy}, 

/C 2 = {(Q,Q): fw(Q)<-a + fv(Q)}, 

/C3 = {(g,Q): fv(Q)>a + fwiQ)-[R-l(Q)] + }, 
ICa = {{Q,Q). s(^QyJv(Q)+[R-I(Q)]+) >r}, 


(47) 

(48) 

(49) 

(50) 

(51) 


*When there are no input constraints. When input constraints are given, as e.g. in the power limited AWGN channel, it is required to 
solve four-dimensional optimization problem (cf. (HU). 




14 


and the exponent 


Eb= __min {d{Qy\x\\W\Px) + [I{Q)-R],} 

(Q,Q)enU^K.i 


(52) 


In addition, let us define the type-enumeration detection random coding exponent as 


{R, a, Px , ^) = min {Ea, Eb} . 


(53) 


Theorem 6. Let a distribution Px and a parameter a € M given. Then, there exists a sequence of codes 
C = {Cn}’fLi of rate R such that for any (5 > 0 


(C, f*) > E^ {R, a, Px,W, y) - <5, 


Emd (C, f*) > E^^ {R, a,Px,W,V)-a- 6. 


(54) 

(55) 


The main challenge in analyzing the random coding FA exponent, is that the likelihoods of both hypotheses, 
namely ^(Y|Xm) and are very correlated due to the fact the once the codewords are 

drawn, they are common for both likelihoods. This is significantly different from the situation in lITOl . in which 
the likelihood X]m=i (4^(X|Xm) was compared to a likelihood (5o(Y), of a completely different distributior ^ . 

We first make the following observation. 

Fact 7. For the detector/decoder ch' 


P,^{Cn,(t>')=Pw (YG 7 ^^,) 


= IPlV ( < g-na 


E™=iY(Y|x„ 

where Pw {A) is the probability of the event A under the hypothesis that the channel is W. Similarly, 

PMo{Cn,f')=Pv (Y0 7^'o) 


= Pv' 


= P1/ 


,E™=iY(Y|x„,) 

' E^=iY(YK) 

X^=i(Y(YK: 


> e 


< e" 


(56) 

(57) 


(58) 

(59) 

(60) 


Thus, the random coding MD exponent can be obtained by replacing a with —a, and W with V in the FA exponent, 
i.e. 

^ ( 61 ) 


lim-log PuuiCn,, f*) = E^ {R, -a, Px, V, W) 

1^00 ni 

where {n;} is the sub-sequence of blocklengths such that T{Px) is not empty. 

Before rigorously proving Theorem 0 we make a short detour to present the type class enumerators concept 
®In 1101 . QofY) represented the hypothesis that no codeword was transmitted and only noise was received. 
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HU, and also derive two useful lemmas. Recall that when analyzing the performance of a randomly chosen code, a 
common method is to first evaluate the error prohahility conditioned on the transmitted codeword (assumed, without 
loss of generality, to he xi) and the output vector y, and average only over {'X.m}m= 2 - Afterwards, the ensemble 
average error prohahility is obtained by averaging w.r.t. the random choice of (Xi, Y). We will assume that the 
codewords are drawn randomly and uniformly from T{Px), and so all joint types Q mentioned henceforth will 
satisfy Qx = Px, even if this is not explicitly displayed. 

To analyze the conditional error probability, it is useful |[T4ll to define the type class enumerators 


N{Q\y) ^ 


X G Cn\xi 



(62) 


which, for a given y, count the number of codewords, excluding xi, which have joint type Q with y. As the 
codewords in the ensemble are drawn independently, N{Q\y) is a binomial random variable pertaining to M = 
[■gnRi probability of success of the exponential order of and consequently, E[Y(Q|y)] = 

exp [n{R — I{Q))]. A more refined analysis, similar to the one carried in l[T4l Subsection 6.3], shows that for any 
given u G R 

F{N{Q\y) > e™} = exp |-e"M+ (n [I{Q) -R+ [u]+] - 1)} . (63) 

Consequently, if I{Q) < R, Y((5|y) concentrates double-exponentially rapidly around its average = 
and if I{Q) > R, then with probability tending to 1 we have Y((5|y) = 0, and P{Y((5|y) > 1} = 
as well as P{Y((5|y) > e"’“} = for any u > 0. 

We now derive two useful lemmas. In the first lemma, we show that if a single joint type Q is excluded from 
the possible joint types for a randomly chosen codeword and y, then the probability of drawing some other 
joint type is not significantly different from its unconditional counterpart. In the second lemma we characterize the 
behavior of the probability of the intersection of events in which the type class enumerators are upper bounded. 

Lemma 8. For any Q ^ Q 

P (Qx,y = Q|Qx,y / Q) = P (gx,y = Q) = (64) 

Proof: For any given Q 

P(Qx,y = q) (65) 

and if I{Q) = 0 then 

p (Qx,y = q) ^ 0, (66) 

as n —)■ oo, although sub-exponentially ifT^ Problem 2.2]. Thus, for any Q Q, 

P {Qy^iy = Q, Qy^iy / q) 

P (Qx,y / q) 


P (<0x,y = Q|<0x,y / Q) 


(67) 
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P (Qx,y = Q) 

1 - p (gx,y = q) 

= ^-nI(Q)_ 


( 68 ) 

(69) 


Lemma 9. Let a set Q of joint types, a continuous function J{Q) in Q, and a type Qy be given. Let {-(^(Q|y)}QeQ 
be a sequence of sets of binomial random variables pertaining to trials and probability of success Pn- Then, 
if Kn = and pn = 


fl {iV(Q|y) 


\Q^Q- Qy—Qy 


= l-o(n), S(Qy;J,Q)>i? 
= otherwise 


(70) 


where y € T(Qy), and 


S(gy;J,Q)= min _ /(Q) + [J(Q)]_ 
QGQ: Qy=Qy 


(71) 


Proof: A similar statement was proved in ifTOl pp. 5086-5087], but for the sake of completeness, we include 
its short proof. If there exists at least one Q G Q with Qy = Qy for which I{Q) < R and R — I(Q) > J(Q), 
then this Q alone is responsible for a double exponential decay of the intersection probability, because then the 
event in question would be a large deviations event whose probability decays exponentially with M = \, 
thus double-exponentially with n, let alone the intersection over all Q £ Q. The condition for this to happen is 
R > S{Qy, J, Q). Conversely, if for every Q £ Q with Qy = Qy, we have I{Q) > R or R — I(Q) < J{Q), i.e., 
R < S{Qy;J, Q), then the intersection probability is close to 1, since the intersection is over a sub-exponential 
number of events with very high probability. Thus (TTOl ) follows. ■ 

Remark 10. A natural choice for iV((5|y) is simply A^(Q|y). However, in what follows, we will need to analyze a 
conditional version of the type enumerators, namely, events of the form {A^(Q|y) = A^i|A^(Q|y) = A^ 2 } for some 
0 < A^i, A ^2 < M. As Lemma [8] above hints, in some cases the conditional distribution of A^(Q|y) is asymptotically 
the same as the unconditional distribution. In this respect, it should be noted that the result of Lemma is proved 
using the marginal distribution of each lV(Q|y) alone, and not their joint distribution. It should also be noted that 
the second argument of S((5 y; •, •) in (ItTI) is & function of the joint type Q, and the third argument is a set of joint 
types. Finally, since the types are dense in the subspace of the simplex of all the type satisfying Qy = Qy, then 
the exclusion of a single type form the intersection in (ITOl ) does not change the result of the lemma. 


Remark 11. As Qx = Px the minimization in (ITT] ) is in fact over the variables {Qy\x{y\x)}x&x,y€iy- Thus, 
whenever J(Q) is convex in Qy\x, then 


S{Qy,J,Q)= min max [L{Q) + XJ{Q)] 
Q£Q:Qy=Qy o<a<i 


( 72 ) 
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max min [I{Q) + XJ{Q)] (73) 

0<A<lQgQ:Qy=Qy 

where (a) is by the minimax theorem |[25l . as both I{Q) and J{Q) are convex in Qy\x the minimization 
set involves only linear constraints and thus convex. This dual form is simpler to compute than (1771 ). since the 
inner minimization in (1731) is a convex optimization problem |[26l . and the outer maximization problem requires 
only a simple line-search. Note that the function s((5y;7) is a specific instance of S((5 f; •, •) defined in (ItTI) wifh 
Q = Qw and J{Q) = —a — fw{Q) + 7 which is convex in Qy\x (in facf, linear). 

We are now ready fo prove Theorem 

Proof of Theorem |6]- We begin by analyzing fhe FA exponent. Assume, without loss of generality, that the 
first message is transmitted. Let us condition on the event Xi = xi and Y = y, and analyze the average over the 
ensemble of fixed composifion codes of fype Px- For brevify, we will denofe Q = Qxiy- The average conditional 
FA probability for the decoder with parameter a is given by 


Pp^(xi, y) 4 P (y G 7^'o|Xl = xi, Y = y) 

^ M 

fL(y|xi) + W{y\X^) < • Y(y|xi) + 6“"“ • Y(y|X, 

y m=2 

> ( fL(y|xi) + 5] lL(y|X^) < • Y(y|xi) ) 


P 

(UR) 


M 


m=2 


m=2 

M 




M 


+ P I lY(y|xi) + ^ lL(y|X^) < 6“"“ • ^ Y(y|X„0 
(IR) 


m=2 


m=2 


( Y, W{y\X^) < e-"“ • Y(y|xi)^ • I {lY(y|xi) < 6“"“ • Y(y|xi)} 

\m=2 / 

/ M M \ 

+ P fL(y|xi) + fL(y|X^) < 6“"“ • Y(y|X„) 


m=2 


m=2 


J]iV(Q|y)e"^-(0) < e-”“ j ■l{fw{Q) < -a + fv{Q)} 

K Q / 

_|_ p I ^nfw(Q) _j_ E 

\ Q Q 

^A{Q) + B{Q) 

= uiax^A{Q),B{Q)^ , 


(74) 

(75) 


(76) 


(77) 


(78) 

(79) 

(80) 


where A{Q) and B{Q) were implicitly defined, and (a) is because {Xm}m =2 are chosen independenfly of (Xi, Y). 
For fhe firsf ferm. 


_ (IR) 

A(Q) = 


{iV(Q|y) < e"[-“+7'^ | j • I |/h.(( 3) < -a + 

yQ- fw(Q)>-oo 


(81) 
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(a) 


= I |s((5y; -a + fv{Q) - fw{Q), Qw) > -R| • I |/h/(<5) < -a + /v(<5)} i 


(82) 


where (a) is by Lemma |9]. Upon averaging over (Xi, Y), we obtain the exponent Ea of (|47]) . when utilizing the 
definition in (l44l) . Moving on to the second term, we first assume that > g. Then, 


(UR) 


B{Q) = E p I gnfw(Q) _|_ E 


Q 


Q 


(IR) 


= f] |Y(Q|y)e’"^'^('3) < .^(Q|y)erx/v(Q)|p 


Q \Q^Q 


(83) 


gn/w(Q) < g-na . Ar(Q|y)g«/v(Q) | p < e"”" • iV(Q|y)e”^''(^)} j (84) 


(a) 


Q- fw{Q)^—o!+fv{Q) \q¥^Q 


|gn/w(Q) < g-na . iv(Q|y)e”^^(^)} 


(b) 


P j 1^ {iv((5|y) < 

Q-fw(Q)<-a+fv(Q) XQjtQ: fw(Q)>-oo 


Q- fw{Q)<—oi+fviQ) 


|l < e”[““+^''(‘2)-/w(Q)] . iV(Q|y)| 

CiQ), 


(85) 


( 86 ) 

(87) 


where (a) is since when fwiQ) > —a + fv{Q) the second event in the intersection implies X(Q|y) = 0, but this 
implies that the third event does not occur, and in (6) we have rearranged the terms. To continue the analysis of 
the exponential behavior of B{Q), we split the analysis into three cases: 

Case 1: 0 < I{Q) < R. For any 0 < e < 72 — I{Q) let 


4 ^en[R-i(Q)-e] < jv(Q|y) < 


( 88 ) 


which satisfies P [Qn] = 1. Thus, 


C(Q) = Fl n {^iQ\y) 

\Qj^Q: fw(Q)>-oo 


< e 


n[-a+fviQ)-fwiQ)] . 


N{Q\y)} 


n 


|l < e’^[""+7v(Q)-/w^(Q)] . iv((3|y)| 


( 89 ) 
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< P I {^(<3|y) < • iV(Q|y)} n 

\Q¥^Q- fw{Q)>-oo 

|l < . iV(Q|y)} j P(an) +P(^) 


(90) 


n {^(Q|y) < • iV(Q|y)} n 

\Q^Q: fw{Q)>—oa 

|l < • A^(Q|y)} [Gn 


(91) 


<p I Pi ^]Sl(^Q^y'j ^ Q'n[-<^+fviQ)-fw(Q)+R-IiQ)+e]'^ 

\Q^Q: fw{Q)>—oo 


{ 


^ g'n^—Oi+fviQ)—fw{Q)+R—I{Q)~^^ 


}\Sn 


(a) 


(92) 


= I |s(Qy; -a + fviQ) - fw(Q) + R - I(Q) + e, Qw) > -R| x 
l[-a + fviQ) - fwiQ) + R- I(Q) + e > O} , (93) 

where (a) is since conditioned on Gn, lV(Q|y) is a binomial random variable with probability of success = 

(see Lemma[8]l, and more than — e^ R-RQ)-(\ = ^nR (whenever Qy = Qy, and A^(Q|y) = 0 otherwise), 


and by using Lemma and Remark [TO 
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Similarly, 


C(g) = P I f] {iV(g|y) < • iV(Q|y)} 

kQ¥=Q- fw{Q)>-oo 


|l < e”[""+l''"(‘3)-/>^(Q)] . Ar(Q|y)| 


(94) 


> P I p {^(g|y) < • iV(g|y)} n 

\Q¥^Q- fw{Q)>—oo 

|l < • iV(Q|y)} \Gn ] HGn) 


(95) 


n {^(g|y) < • iV(g|y)} n 

\Q¥^Q- fwiQ)>-°o 


|l < . A^(Q|y)} \Gn 


(96) 


'°We have also implicitly used the following obvious monotonicity property: If A'^i and N 2 are two binomial random variables pertaining 
to the same probability of success but the number of trials of A'^i is larger than the number of trials of N 2 then P (A'^i < L) < P {N 2 < L). 
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>P 


Pi |A^((5|y) < I n 

\Qj^Q: fw{Q)>—oo 


{ 


I < gn[-a+fviQ)-fw{Q)+R-HQ)-<^ 


}\Sn 


= I {s(Qy; -a + fv{Q) - fw{Q) + R- HQ) - e, Qw) > i?} x 
I {-a + MQ) - fw{Q) + R- HQ) - e > 0 }, 


(97) 


(98) 


where (a) is now since conditioned on Qn, -^(Q|y) is a binomial random variable, with probability of success 
= (see Lemma [8]l, and less than trials (whenever Qy = Qy, and A^(Q|y) = 0 otherwise), and by 

utilizing again Lemma |9] and Remark [TOl As e > 0 is arbitrary. 


HQ) = I |s(Qy; -a + fviQ) - fw{Q) + R - HQ)-, Qw) > x 

l[-a +fv{Q)-fw{Q) + R-HQ)>^] (99) 

Case 2: Assume that HQ) — 0- This case is not significantly different from Case 1. Indeed, for any 0 < e < 7?, let 

Qn = < iV(Q|y) < , (100) 

then P [Qn] = 1. To see this, we note that for X; drawn uniformly within T{Px)- 


E [iV(Q|y)] = • P (Ox^y = Q) (101) 

(a) 1 

< ( 102 ) 
for all n sufficiently large, where (a) is since P = Q^ —> 0 as n ^ 00. So, by Markov inequality 

p|iV(Q|y) < > P{iV(Q|y) < 2E [iV(Q|y)] } > (103) 

Since, as before p|e”(^“'^l < X(Q|y)} = 1, and the intersection of two high probability sets also has high 
probability, we obtain P [Qn] = 1- The rest of the analysis follows as in Case 1, and the result is the same, when 
setting HQ) = 0- 

Case 3: Assume that HQ) > R- Then, for any e > 0 


HQ) = P n {^^(Qly) < • iV(Q|y)} n 

\q¥^Q- fw{Q)>-oo 

|l < e’"[-“+7''(Q)-/'^('3)] . iV(Q|y)| j (104) 
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(a) 

= P 


\Q^Q: fw{,Q)>—oo 


n 


{ 


1 < e 


.n[-a+fv(Q)-fw{Q)] 


|1 < N{Q\y) < e’^M F (1 < iV(Q|y) < 


(105) 


{b) 

> 


n {^(Q|y) < } n 

\Q^Q: fw{Q)>—oa 


|l < |1 < iV((5|y) < e'"" j 


(c) 


I {s(Qy; -a + fviQ) - fw{Q), Qw) > i?} I {-a + fv{Q) - fw{Q) > o} , 

where (o) is since conditioned on A^(Q|y) = 0 the prohahility of the event is 0, and 


(106) 

(107) 


(6) is since 


F [N{Q\y) > = 0, 


P (1 < iV(Q|y) < > P (A^(Q|y) = l) 

= Q-n{l(Q)-R) 


(108) 


(109) 

( 110 ) 


and (c) is since conditioned on 1 < A^((5|y) < iV((5|y) is a hinomial random variable, with prohahility of 

success = (see Lemma [S]), and = trials (whenever Qy = Qy, and A^(Q|y) = 0 otherwise), and hy 

utilizing once again Lemma and Remark [TO] Similarly, using 

P (1 < iV(Q|y) < (A^(Q|y) = l) = , (111) 


the same analysis as in the previous case, shows a reversed inequality. As e > 0 is arbitrary, then 

C(Q) = I {s(Qy; -« + fv(Q) - fw{Q), Qw) > i?} I {-a + fv(Q) - fw{Q) > o} (112) 

Returning to (1871) . we obtain that B{Q) is exponentially equal to the maximum between 


_ _ _ _max _ _ _ I 

Q'- fw{Q)<—ci+fv{Q), fv{Q)>o:+fw{Q)—R+I{Q) 


|s((5y; -a + fv{Q) - fw{Q) + R- I{Q), Qw) > 7?|, 

(113) 


and 


_ _ _ _ I 

Q- fw{Q)<—ct+fv{Q), I{Q)>R, fv{Q)>o!+fw{Q) 


|s(gy; -a + fy{Q) - /^(Q), Q^) > f?} e-(^(Q)-«), 


(114) 
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or, more succinctly, 


B{Q) = umxl[s{QY;-a + fv{Q)-fw{Q) + [R-I{Q)]^,Qw)>R}e (115) 

Q 

where the maximization is over 


\q '■ fw{Q) < -a + fv{Q), fv{Q) > a + fw{Q) - [-R - • (116) 

Now, in the evaluation of B{Q) we have assumed that > 0. However, there is no need to analyze the case 

gn/w(Q) _ g since as 

fw{Q) = -D{Qyix\\W\Px) - Hq{Y\X) (117) 


and Hq{Y\X) < log|>’|< oo, then = 0 implies P((5xiy = Q) = exp —nD{QY\x\\W\Px) = e 

Thus, upon averaging over (Xi, Y) we obtain the exponent of (|5^ . utilizing (1441) . Then, we obtain the required 
result from (l80l ). 

Next, for the MD exponent, we observe that as {R,a,Px,W,V) is continuous in a, Fact |7] above implies 
that the MD exponent will be also continuous in a. So, Proposition |4] implies that when the codewords are drawn 
from a fixed composition ensemble with distribution Px, 


lim --\ogP^^{Cn,A*) = E^UR,o^,Px.W,V) - a. (118) 

1^00 ni 

Finally, the continuity of E^^ {R,a,Px,W,V) in Px implies that for all sufficiently large n, one can find a 
disfribution P^ close enough to Px such that (l54b and (1551) hold, which completes the proof of the theorem. ■ 
To keep the flow of the proof, we have omitted a technical point which we now address. 

Remark 12. The ensemble average FA probability should be obtained by averaging Pfa(Xi,Y) w.r.t. (Xi,Y). 
However, we have averaged its asymptotic equivalence in the exponential scale, resulting from analyzing the terms 
A{Q) and B{Q). Thus, in a sense, we have interchanged the expectation and limit order. This is possible due to 
the fact that all the asymptotic equivalence relations become tight for n sufficiently large, which does not depend 
on Q (i.e. on (Xi, Y)). Indeed, the union and intersection rules add a negligible term to the exponent. This term 
depends only on the number of types, which is polynomial in n, independent of the specific type Q. The asymptotic 
equivalence relations that stem from Lemma do not depend on Q, as functions of Q only play the role of bounds 
on the sums of weighted type enumerators. Indeed, it is evident from the proof of Lemma |9] that the required 
blocklength n to approach convergence of the probability does not depend on J{Q). 
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B. Expurgated Exponents 

We begin again with several definitions. Throughout, Pxx represent a joint type of a pair of codewords. 
Let us define fhe Chernoff disfanceQ 


and fhe sef 


=-log ( ^ 
yv&y 


^ = {Pxx ■■ Px = Px, I{Pxx) < P} 


(119) 


( 120 ) 


In addition, lef us define fhe type-enumeration detection expurgated exponent as 

E'^l{R,a,Px,W,V)= uiiOL min \ as + K\ds{X, X)] + I{Pxx) - r } . (121) 

0<s<lPxx&C L L J J 

Theorem 13. Let a distribution Px and a parameter a (z M be given. Then, there exists a sequence of codes 
C = {Cn}^=i of rate R such that for any 6 > 0 


{C, f*) > E^ (R, a, Px,W, y) - 5, 


^MD (C, f*) > E^^ {R, a,Px,W,V)-a- 6. 


( 122 ) 

(123) 


The proof can be found in Appendix lAl 

Remark 14. Holder inequalify shows fhaf ds{x, x) > 0. In (11211) . fhere is freedom fo maximize over 0 < s < 1, and 
nafurally, s = ^ is a valid choice. Due fo fhe symmefry of ds{x, x) in s around s = ^ when W = V, for fhe ordinary 
decoding exponenf, fhe optimal choice is s = ^ (as also manifested af 22 = 0 by fhe Shannon-Gallager-Berlekamp 
upper bound |[271 Theorem 4]), buf here, no such symmefry exisfs. 

Remark 15. In Theorem [T3] we have assumed a fixed composition code of fype Px- As discussed in |[T^ Problem 
10.23 (b)], for ordinary decoding, fhe exponenf (11211) is af leasf as large as fhe corresponding exponenf using 
Gallager’s approach fo expurgafion 11271 Secfion 5.7], and for fhe maximizing Px, the two bounds coincide. Thus, 
for ordinary decoding, the exponent bound (11211) offers an improvement over Gallager’s approach when the input 
type Px is constrained. For joint detection/decoding, there is an additional source of possible improvement - the 
input type Px which best suits channel coding is not necessarily the best input type for the detection problem. 
We also mention that for 22 = 0, an improvement at any given Px can be obtained by taking the upper concave 
envelope of (1121b (see ifT^ Problem 10.22] and the discussion in |[28l Section II]). 

Remark 16. This expurgation technique can be used also for continuous alphabet channels, and specifically, for 
AWGN channels, see |[29l Section 4]. 

"when s is maximized, then the result is the Chernoff information GH Section 11.9]. For s = | this is the Bhattacharyya distance. 
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C. Exact Random Coding Exponents of Simplified Detectors/Decoders 

We now discuss the random coding exponents achieved hy the simplified detectors/decoders and introduced 
in Suhsection IIV-BI We begin with 0 l. For 7 G M, let us define 


fhe sefs — Ji and 


t(Qy,7)- .min liQ), 

Q€Qw: Q=Qy,— ot—fw 


J2,l = : t (Qy, fviQfj > -R| , 


fhe exponenf 


fhe sefs /Ci^l — Kli, K, 2 y 



Eay= min D{Qyix\\W\Px), 


iC3Y = {iQ,Q)- MQ) > a + fw{Q)}, 

K-4,l — |(Q, Q) : t (^Qy, fv{Q)j > -R| , 

and fhe exponenf 

Eby= _min D{Qyix\\W\Px)+[I(Q)-R].. 

(Q,Q)enti^..L 

In addition, lef us define fhe low-rate detection random coding exponent as 


(124) 

(125) 

(126) 

(127) 

(128) 

(129) 


El^ {R, a, Px, VF, F) = min {Ea,l, Eb,l} ■ 


(130) 


Theorem 17. Let a distribution Px and a parameter a > ^ be given. Then, there exists a sequence of codes 
C = {Cn}^=i of rate R such that for any <5 > 0 

E,^ {C, f*) > El^ {R, a, Px,W, V) - 5, (131) 

Smd (C, f*) > El^ {R, -a, Px, F, IF) - 5. (132) 


The proof can he found in Appendix |B] 

Nexf, we discuss fhe random coding exponenfs of fin- As fhis is a simple hypofhesis fesfing hefween fwo 
memoryless sources W and V, fhe sfandard analysis |[30l and ifT^ Section 11.7] is applicable verbatim. For given 
0 < p < 1 , lef 


^ VF^(^)l/^-^(j/) 


(133) 


*^It can be noticed that the only difference between /Cs^l, A34,l and fCa, A34 are the exclusion of I{Q) — R terms and replacing s(Qy,7) 
with t(( 3 y, 7 ). 
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for all X ^ X, and let us define the high-rate detection random coding exponent as 

Ef {R,a,Px,W,V) ^ (134) 

where ^jL{a) is chosen so that 

D{Q^^a)\\W) - f?(Q^(„)||F) = -a. (135) 

Theorem 18. Let a distribution Px and a parameter a > ^ be given. Then, there exists a sequence of codes 
C = {Cn}'^=i of rate R such that for any <5 > 0 

(C, f*) > Ef (R, a, Px,W, y) - 5, (136) 

^MD (C, f*) > {R, a,Px,W,V)-a- 6. (137) 

Proof: The proof follows the standard analysis in ifTSl Section 11.7]. ■ 

Remark 19. The decoder fyi and its random coding exponents do not depend on the rate R. 


D. Gallager/Forney-Style Exponents 

Next, we derive achievable exponents using the classical Gallager/Forney technique. 

1) Random Coding Exponents: For a given distrihution {Px{x)}x&x, and parameters s,p, define 

/ \ P 


Eois,p) - - log 


E E Px{x)W^^-^^/'’{y\x)r/'’{y\x) 

y&y \x&X 


and 


K{s,p) - - log 


(Y, Px{x)r/'’{y\x)] 
yey \xex J \xex J 


and let the Callager/Eorney detection random coding exponent he defined as 


(138) 


(139) 


E^ {R,a,Px,W,V) = max min {as + £'o(s, p) — (p — l)ii, 

0<s<l,max{s,l—5}<p<l 

as + Eq{s,p) — (2p — l)ii} . 


(140) 


Theorem 20. Let a distribution Px and a parameter a & M. be given. Then, there exists a sequence of codes 
C = {Cn}’^=i of rate R such that for any (5 > 0 


E,^ (C, f*) > El^ {R, a, Px,W, V) - 6, (141) 

Emd (C, f*) > E^^ {R, a,Px,W,V)-a- 6. (142) 


The proof can he found in Appendix O 
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2) Expurgated Exponents: For a given distribution {Px{x)}x^x and parameters s,p, define 


K[s)^-\og 


'^Px{x)'^W^ ^{y\x)V^{y\x) 

xGX y&y 


and 


^-log 

and let the Gallager/Eorney detection expurgated exponent be defined as 


Y1 Pxix)V^{y\x) 
yey \xex ) \xex / 


(143) 


(144) 


{R,a, Px,W,V) = sup min |sa + £'^(s), sa +-E^^(s) — pi?} . (145) 

0<s<l,p>l 

Theorem 21. Let a distribution Px and a parameter a G M given. Then, there exists a sequence of codes 
C = {Cn}^i of rate R such that for any (5 > 0 


{C, f*) > El^ {R, a, Px,W, V) - 6, (146) 

Emd (C, f*) > El^ (i?, a,Px,W,V)-a- 6. (147) 

The proof can be found in Appendix |D] 


E. Discussion 

We summarize fhis section wifh fhe following discussion. 

1) Monotonicity in the rate: The ordinary random coding exponenfs are decreasing wifh fhe rafe R, and vanish 
af I{Px X W). By contrasf, fhe defection exponenfs are nof necessarily so. Indeed, fhe exponenf Ea of (1471 ) 
is increasing wifh fhe rafe. For fhe exponenf Eb of (l52l) . as R increases, fhe objecfive function decreases 
and /C3 expands, buf fhe sef /C4 diminishei^ and so no monofonicify is assured for Eb, and as a resulfs, 
also for Ej^ {R,a,Px,W,V). The same holds for whereas does nof depend on R af all. The 
expurgafed exponenf E™ {R,a,Px,W,V) of (11211) decreases in R. To gain infuifion, recall from (1631) . fhaf 
when I{Q) < R fhe fype enumerator A^((5|y) concenfrafes double-exponenfially rapidly around ifs average 
= exp [n{R — /(Q))]. Thus, for any given y, an increase of fhe rafe will infroduce codewords having a joinf 
fype fhaf was nof fypically seen af lower rafes, and fhis new joinf fype mighf dominate one of fhe likelihoods. 
However, if is nof clear fo which direction fhis new fype will tip fhe scale in fhe likelihoods comparison, and 
so fhe rafe increase does nof necessarily imply an increase or a decrease of one of fhe exponenfs. In addifion, 
fhe above discussion and (|2T]) imply fhaf fhe largesf achievable rate such fhaf P,e — > 0 as n — 00 , may still 
be fhe mufual information I{Px x W), or, in ofher words, fhe defection does nof cause a rafe loss. 

2) Computation of the exponents: Unforfunafely, fhe opfimizafion problems involved in compufing fhe exacf 
exponenfs of Subsecfions IV-AI and IV-CI are usually nof convex, and mighf be complex fo solve when fhe 

’^As its r.h.s. always increases, but its l.h.s. does not. 
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alphabets are large. For example, for the exact exponents, computing Ea of (1771) is not a convex optimization 
problem since J 2 is not a convex set of Q, and computing Eb oi (l52l) is not a convex optimization problem 
since /C3 and /C4 are not convex sets of {Q, Q), and not even of {Qy\x^ Qy\x)- efficient algorithm their 
efficient computation is an important open problem. However, the expurgated exponent (11211) is concave^ in 
s and convex in Pxx- promotes the importance of the lower bounds derived in Subsection IV-Dl which 
only require two-dimensional optimization problems, irrespective of the alphabet sizes. 

3) Choice of input distribution'. Thus far, the input distribution Px was assumed fixed, but it can obviously 
be optimized. Nonetheless, there might be a tension between the optimal choice for channel coding versus 
the optimal choice for detection. For example, consider the detection problem between W, a Z-channel, i.e. 
VF(0|0) = 1,H^(0|1) = w for some 0 < tu < 1, and V, an S-channel, i.e. H(1|0) = u,l/(l|l) = 1 for some 
0 < u < 1. Choosing Px(0) = 1 will result an infinite FA and MD exponents (upon appropriate choice of 
a), but is useless from the channel coding perspective. One possible remedy is to define a Lagrangian thaf 
weighs, e.g. fhe FA and ordinary decoding exponents with some weight, and optimize it over the input type. 
However, still, the resulting optimization might be non-tractable. 

4) Simplified decoders'. Intuitively, the low-rate simplified detector/decoder has worse FA-MD trade-off 

than the optimal detector/decoder f since the effect of a non-typical codeword may be averaged out in 
'h Sm=i but may totally change maxi<m<M W{y\x.m)- However, there exists a critical rate 

such that for all R < R„ the exponents of the two detectors/decoders coincide, when using the same parameter 
a. To see this, first let 

QA = ar_gminT»(Qy|x||lC|Px), (148) 

QGvTi 

i.e. the exponent Ea for i? = 0, and in fact, for all rates satisfying 

R<s{Qy-Jv{Qa)) =Rc,a. (149) 

Since from Remark |28] (Appendix |B]) 

s((3r,7) < (150) 

this is also the exponent Ea,l- Now, letting i? = 0 in {/Cj}|^3 and then solving 

(Qb,Qb) = [d{Qyix\\W\Px) + 1^} (151) 

we get the exponent Eb for R = 0, and in fact, for all rates satisfying 

R < min {i(Qb), s (Qy; /v(Qs)) } = R..n- (152) 

‘"^The second derivative w.r.t. s of ds{x,x) is the variance of logw.r.t. the distribution Py which satisfies Pyly) oc 
W^-^{y\x)V^{y\x). 
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Similarly, this is also the exponent Eb^i^- In conclusion, for all R < = min -Rcib} it is assured that 

the FA exponents of 0' and are exactly the same. In the same manner, a critical rate can he found for the 
MD exponent. For the the high-rate simplified detector/decoder (^h we only remark that in some cases, the 
output distrihutions W and V may he equal, and so this detector/decoder is useless, even though (j)' achieves 
strictly positive exponents (cf. the example in Section IVIII) . 

5) Continuous alphabet channels: As previously mentioned, one of the advantages of the Gallager/Forney-Style 
hounds is their simple generalization to continuous channels with input constraints. We hriefly descrihe this 
well known technique Chapter 7]. For concreteness, let us focus on the power constraint E[7f^] < 1. 
In this technique a one-dimensional input distrihution is chosen, say with density fx{x), which satisfies the 
input constraint. Then, an n-dimensional distrihution is defined as follows 


/„(x) = V> ^lln-5 <'^xl^^<n\WPx{xi), (153) 

I i=l ) i=l 

where ijj is & normalization factor. This distrihution corresponds to a uniform distrihution over a thin n- 
dimensional spherical shell, which is the surface of the n-dimensional ‘hall’ of sequences which satisfy the 
input constraint. While this input distrihution is not memoryless, it is easily upper hounded hy a memoryless 
distrihution: hy introducing a parameter r > 0, and using 


I < n — 5 < Xm,i < > < exp 


i=l 


xY -n +5 


\ 2 = 1 


we get 


Now, e.g., in the derivation in (IC.9I) we may use 


(154) 


(155) 


2=1 


E 


’iy‘^"^Vp(y|x.^)r/'’(y|X^)l = / /„(x)VF'^-“V'>(y|X^)r/'’(y|X„,)dx 

fx kFlx)!/"/" {yi\x)dx 




(156) 

(157) 


As discussed in ll^ p. 341], the term is suh-exponential, and can he disregarded. Now, the resulting 

exponential functions can he modified. For example, for a pair of power constrained AWGN channels W and 
V, we may definj^ 

Kis,P,r) = - log 


oo / roo 


fx{x)e'^^^^ ’‘^^'’{y\x)V‘’f'’{y\x)dx] dy 


— OO \J—OO 


(158) 


'^Since the additive noise has a density, the probability distributions in the bounds of subsection I V-DI can be simply replaced by densities, 
and the summations can be replaced by integrals. 
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where the dependence in r was made explicit, and similarly, 

/ OO / poo \ ^ 

( / ( / fx{x)e''^^'^^~^W''^'’{y\x)dx] dy, 

-OO \j —OO J \j —OO J 

(159) 

which requires two new parameters ri, r 2 . Then, the exponent in (1140b can he computed exactly in the same 
way, with additional maximization over non-negative r,ri,r 2 - To obtain an explicit hound, it is required to 
choose an input distribution. The natural choice is the Gaussian distribution, which is appropriate from the 
channel coding perspectivq^, and also enables to obtain analytic bounds. Of course, it might be very far from 
being optimal for the purpose of pure detection. Then, the integrals in (1158b can be solved by ‘completing 
the square’ in the exponent of Gaussian distributioni^ and the optimal values of r and p can be found 
analytically ll^ Section 7.4]. Here, since two channels are involved, and we also need to optimize over 
s, we have not been able to obtain simple expressionj^. Nonetheless, the required optimization problem is 
only four-dimensional, and can be easily solved by an exhaustive search. Finally, it can be noticed that the 
computing the expurgated bounds is a similar problem as 


K{s,r) = E'q{s,p = l,r) 


(160) 


and 


Kis,r) = E'f^{s,p = l,r). 


(161) 


6 ) Comparison with As mentioned in the introduction (Section Jl), the problem studied here is a general¬ 
ization of ifTOll . Indeed, when the channel V does not depend on the input, i.e. I/(y|x) = (5o(y)^ then the 
problem studied in lITOl is obtainecO Of course, the detectors derived in Section |IV] can be used directly 
for this special case. Moreover, the exponent expressions can be slightly simplified as follows. A joint type 
Q is feasible if and only if fw{Px x Qy) < —« + fv{Px x Qy), both in Ea of (1471) and Eb of (1521) . as 
otherwise, the sets J 2 and /C 4 are empty. For any such Q which satisfies this condition, when utilizing the 
fact that fv{Q) depends only on Qy = Qy, the optimal choice for Eb is Q = Px x Qy, since it results 
I{Q) = 0. Under this choice, we get Ji C /C 3 and ^2 C /C 4 and so Ea > Eb- Thus, from (1531) 


E^{R,a,Px,W,V)= min D{Qy\x\\W\Px) (162) 

where 

-Ms = {g : fv{Q) > « + fw{Q) - -R} , (163) 

’^Nevertheless, it should be recalled that Gaussian input is optimal at high rates (above some critical rate). At low rates, the optimal input 
distribution is not known, even for pure channel coding. 

’^Namely, the identities exp \_—aC — bt\ dt = • e'^ and exp [“®t] 'E' ~ 

’’’Nonetheless, for a given s, the expression for Eq{s, p, r) is rather similar to the ordinary decoding exponent Eo{p, r) and so the optimal 
p and r can be analytically found. 

’^The meaning of FA and MD here is opposite to their respective meaning in ca, as sanctioned by the motivating applications. 
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replaces /C 3 , and 

Ma = [q-. s (Qy, MQy) + r)>r}, (164) 

replaces /C 4 . Thus, the minimization in the exponent is only on Q. 


VI. Composite Detection 

Up until now, we have assumed that detection is performed between two simple hypotheses, namely W and V. 
In this section, we hriefly discuss the generalization of the random coding analysis to composite hypotheses, to 
wit, a detection between a channel lU G W and a channel U G V, where W and V are disjoint. Due to the nature 
of the problems outlined in the introduction (Section IJl, we adopt a worst case approach. For a codebook Cn and 
a given detector/decoder cj), we generalize the FA probability to 

1 ^ 

-PpA (Cn » = max — V FF (7^o I x„,), (165) 

WeW M 

m=l 

and analogously, the MD and IE probabilities are obtained by maximizing over U G V and W G >V, respectively. 
Then, the trade-off between the IE probability and the EA and MD probabilities in (fT^ is defined exactly the same 
way. 

Just as we have seen in (l22l) (proof of Proposition O, for any sequence of codebooks Cn and decoder (j) 


where here, Eo{Cn,4>) is the exponent achieved by an ordinary decoder, which is not aware of W. Thus, the 
asymptotic separation principle holds here too, in the sense that the optimal detector/decoder may first use a detector 
which achieves the optimal trade-off between the EA and MD exponents, and then a decoder which achieves the 
optimal ordinary exponent. 

We next discuss the achievable random coding exponents.!^ As is well known, the maximum mutual information 
|[^ . ifT^ Chapter 10, p. 147] universally achieves the random for ordinary decoding. So, as in the simple hypotheses 
case, it remains to focus on the optimal trade-off between the EA and MD exponents, namely, solve 


minimize PpA 

subject to Pmd < (167) 


for some given exponent E^d > 0. The next Eemma shows that the following universal detector/decoder (jf, whose 
rejection region is 


^0 = 


y : e 


M M 'I 

■ ^ ^ “axU(y|x^) I , 


( 168 ) 


m=l 


m=l 


universal decoding, typically only the random coding exponents are attempted to be achieved, cf. Remark f25\ 
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solves (I167I ). The universality here is in the sense of (11671 ). i.e., achieving the best worst-case (over W) FA exponent, 
under a worst case constraint (over V) on the MD exponent. There might he, however, a loss in exponents compared 
to a detector which is aware of the actual pair (VF, V) (cf. Corollary |2^ . 

Lemma 22. Let C = {Cn} be a given sequence of codebooks, let be as above, and let f be any other partition 
of into M + 1 regions. Then, if Epj^{C,(j)) > then < EMoiCmf*)- 

Proof: The idea is that the maximum in (1165b can be interchanged with the sum without affecting the 
exponential behavior. Specifically, let us define the sets of channels which maximize fwiQ) for some Q 

Wu = IVF G W : 3Q such that W = ar gmax fwfQ) \ ■ (169) 

I W'&W } 

Clearly, since fw{Q) is only a function of the joint type, the cardinality of the sets Wu is not larger than the 
number of different joint types, and so their cardinality increases only polynomially with n. Then, 

1 ^ 

P,^{Cn,f) = max Y. mYI 
yGTlo rn=l 

1 ^ 

< TT max W(y\xm) 

y&Tlo rn=l 
1 ^ 

= y^ TT y^ max Wiylxm) 

^ M ^ W€Wu ^ 

y&Tto m=l 

= Y b(y) 

y&'tio 

^YjfYY wiyM 

yelZo m=lW&W\j 
M 

= Y mYY W{y\^m) 

W&y-u m=lyCjlo 
1 ^ 

= max — y^ y^ iy(y|xm) 
tyeWu M ^ ^ rn) 

m=l y&lo 

1 “ 

< max — y^ y^ VF(y|xm) 

“ wew M ^ ^ VJI m; 

m=l yGT^o 

= PrAiCni f) 

where the measure g{y) was implicitly defined. Thus, up to a sub-exponential term which does not affect exponents, 

PFA{Cn,(p) = Y b(.y)- 

y&T^o 


(170) 

(171) 

(172) 

(173) 

(174) 

(175) 

(176) 

(177) 

(178) 


( 179 ) 
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Similarly, defining the measure 

1 ^ 

“ M ^ maxl/(y|x^) (180) 

m=l 

we get 

PMD(Cn,</')= /i(y). (181) 

y^T^o 

Now, the ordinary Neyman-Pearson lemma ifT^ Theorem 11.7.1] can he invokec]^ to show that the optimal detector 
is of the form (11681) . which completes the theorem. ■ 

It now remains to evaluate, for a given pair of channels {W, y) € >V x V, the resulting random coding exponents 
when 4>^ is used. Fortunately, this is an easy task given Theorem Let us define the generalized normalized 
log-likelihood ratio of the set of channels W as 

/w(Q) = max ^ Q{x,y) log W{y\x). (182) 

w&N 

x€X,y€y 

The following is easily verified. 

Corollary 23 (to Theorem [3. Let a distribution Px and a parameter a G R 7*^ given. Then, there exists a sequence 
of codes C = {Cn}^i of rate R, such that for any (5 > 0 


(C, {R, a, Px,W, V) - 6, (183) 

Emd (C, (If) > {R, a,Px,W,V)-a-d (184) 

where E^^^j {R,a, Px,W,V) is defined as E^^ {R,a, Px,W,V) of (|5^ . but replacing fw{Q) with fw{Q) and 
fv{Q) with /v(Q) in all the definitions preceding Theorem^ 

We conclude with a few remarks. 

Remark 24. The function fwiQ) is a convex function of Q (as a pointwise maximum of linear functions), hut 
not a linear function. This may harden the optimization problems involved in computing the exponents. Also, we 
implicitly assume that the set of channels W is sufficiently ‘regular’, so that fw{Q) is a continuous function of Q. 

Remark 25. The same technique works for the simplified low-rate detector/decoder. Unfortunately, since the hound 

(IA.4I ) (Appendix utilizes the structure of the optimal detector/decoder, it is difficult to generalize the hounds 

which rely on it, namely, the expurgated exponents and the Gallager/Forney-style hounds. This is common to many 

other problem in universal decoding - for a non-exhaustive list of examples, see Il32l . [331, Il34l . 1351, |[36l . 

Remark 26. A different approach to composite hypothesis testing is the competitive minimax approach lITTl . In this 

approach, a detector/decoder is sought which achieves the largest fraction of the error exponents achieved for a 

^*Note that the Neyman-Pearson lemma is also valid for general positive measures, not just for prohahility distributions. This can also be 
seen from the Lagrange formulation | |28L 
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detection of only a pair of channels (W, V), uniformly over all possible pairs of channels {W, V). The application 
of this method on generalized decoders was exemplified for Forney’s erasure/list decoder ifTTl in ll3^ . |[3^ . and 
the same techniques can work for this problem. 

VII. An Example: A Detection oe a Pair Binary Symmetric Channels 

Let W and V he a pair of BSCs with crossover probabilities w G (0,1) and v G (0,1), respectively. In this case 
the exponent bounds of Section |V] can be greatly simplified, if the input distribution is uniform, i.e. Px = \)- 

Indeed, in Appendix |E] we provide simplified expressions for fhe fype-enumeration based exponents. Interestingly, 
while this input distribution is optimal from the channel coding perspective, the two output distributions W and V it 
induces are also uniform, and so the simple decoder which only uses the output statistics, namely 0 h of Subsection 
IIV-BI is utterly useless. However, the optimal decoder (j)' can produce strictly positive exponents. 

We have plotted the FA exponent versus the MD exponent for the detection between two BSCs with ri; = 0.1 
and V = 0.4. We have assumed the uniform input distribution Px = ( 5 , 5 ), which results the capacity Cw — 
I{Px X W) ^ 0.37 (nats). Figure [T] shows that at zero rate, the expurgated bound which is based on type- 
enumeration significantly improves the random coding bound. In addition, the Gallager/Forney-style random coding 
exponent coincides with the exact exponent. By contrast, the Gallager/Forney-style expurgated exponent offers no 
improvement over the ordinary random coding bound (and thus not displayed). Figure |2shows that at i? = 0.5-Cw, 
the simplified low-rate detector/decoder (/)l still performs as well as the optimal detector/decoder (/>'. This, in fact 
continues to hold for all rates less than R ^ 0.8 Cw- In addition, it is evident that the Gallager/Forney-style random 
coding exponent is a poor bound, which exemplifies fhe importance of the ensemble-tight bounding technique of 
the type enumeration method. 


Appendix A 
Proof of Theorem [T3] 

Before getting into the proof, we derive a standard bound on the FA probability, which will also be used in 
Appendices 0 and |Dl For any given code and s > 0 
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Figure 1. The trade-off between the FA exponent and the MD exponent at i? = 0, for the detection of a BSC W with crossover probability 
0.1, from a BSC V with crossover probability 0.4, when using the optimal detector 4>'. The solid line corresponds to the exact random 
coding exponent, and also to the Gallager/Forney-style random coding exponent. The dashed line corresponds to the expurgated exponent. 



Figure 2. The trade-off between the FA exponent and the MD exponent at i? = 0.5 • Cw, for the detection of a BSC W with crossover 
probability 0.1, from a BSC V with crossover probability 0.4. The solid line corresponds to the exact random coding exponent of 0', and 
also to the exact random coding exponent of </)l. The dotted line corresponds Gallager/Forney-style random coding exponent of (f>'. 


where (a) is from (fTTI) . 

Proof of Theorem 17?} For a given code Cn, a codeword 1 < m < M, and a joint type Pxx^ define the type 
class enumerator 

N^{P^j^,Cn) = 



(A.5) 
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Upon restricting 0 < s < 1 in (IA.4I ). we obtain the bound 


P,A{CnA')<e-^^^ Y, 



yey^ 

(a) 

1 

< 

—nas 


M 

1 

(A) 

—nas f 


M 


M 

i 7 E 


M ^ 

m=l 
M M 

EEE 

m=l k=ly£y" 
M 

EE''"'( 

m=l P„i> 


1-s r 


M 


M 


E 


m=l 


—n ( Ep. 


ds{X,X) 


(A.6) 

(A.7) 

(A.8) 


where (a) follows from Yhi ^ (E.«*r for p < 1, and (6) is using (IA.5I) and (II191) . Now, the packing lemma 
ifT^ Problem 10.2] essentially shows (see also |[2^ Appendix]) that for any (5 > 0, there exists a code C* (of rate 
R) such that 

\^v[n{R + 5-I{PxM^ I{Pxx)<R + ^ 

0, I{P^j^) > R + 5 


Xm{PxX:^n) — 


(A.9) 


for all 1 < m < M and Pxx- This, along with Proposition |4] completes the proof of the theorem. ■ 

Appendix B 
Proof of Theorem [TtI 

The proof is very similar to the proof of Theorem [h] We will use the following lemma, which is analogous to 
Lemma |9] 

Lemma 27. Under the conditions of Lemma |9] 


PI {l{iV(Q|y) > l} <e”AQ)| 

\Q^Q' Qy=Qy 


= l-o(n), T{Qy\J.,Q)>R 
= Otherwise 


(B.l) 


where y G T{Qy), and 


Proof: We have 


T(Qy;J,Q)= min I{Q). 

QeQ:Q=Qy,J(Q)<0 


(B.2) 


n {l{A^(Q|y)>l}<e"^(«)} =P n {I{iV(g|y) = 0}} I . 

\Q^Q'‘Qy=Qy / \Q£Q-Qy=Qy,J(Q)^0 

From this point onward, the proof follows the same lines of the proof of Lemma 

Remark 28. Remarks [TOl and ITT] are also valid here. If J{Q) is convex in Qy\x Lagrange duality 
5] implies 

T{Qy',J,Q)= min max [/(Q) + AJ(g)] 

QeQ:Q=QY ^>0 


(B.3) 


Chapter 


(B.4) 
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= max min [/(Q) + AJ((5)] . 
^>0 QeQ:Q=QY 


(B.5) 


The only difference from S((5 y; J, Q) of (17^ in this case is the maximization domain for A. Note that the function 
t((5y; 7 ) of (11241) is a specific instance of T(Qy; ■, •) defined in (IB.2I) wifh Q = Qw and J{Q) = —a — fw{Q) + l 
which is convex in Qy\x (ir* facf, linear). 

Proof of Theorem \T7\ In general, since 


M 


iy(y|x^) = ^iV(g|y), 


,nfw{Q) 


(B. 6 ) 


m=2 


Q 


huf 


max W{y\xm) = maxi{A^(g|y) > 
2<m<M Q 

= Y^i^iQly) > 

Q 


(B.7) 

(B. 8 ) 


fhen fhe analysis of fhe FA exponenf of follows fhe same lines as fhe analysis in fhe proof of Theorem 0 when 
replacing A^(g|y) wifh I{iV(g|y) > 1}. Thus, in fhe following we only highlighf fhe main changes. Jusf as in fhe 
derivations leading fo (l80l ). 


PFA(xi,y) = P(y G 77 o,l|Xi = xi, Y = y) 


= max 




(B.9) 

(B.IO) 


where 


AL(g)^P I 5]l{iV(g|y) > | ■l{fwiQ) < -a + MQ)} 


and 


(B.ll) 


= P +maxI{Y(g|y) > < e“”" • maxI{Y(Q|y) > j , (B 72 ) 

\ Q Q J 

For the first term, 


\Q:/w(Q)>-cxd / 

= I {T(gy; -a + fv{Q) - fw{Q), Qw) > i?} • I {fwiQ) <-a + fv{Q)} , (B.14) 


where (a) is hy Lemma 1271 Upon averaging over (Xi,Y), we ohfain fhe exponenf Ea,l of (11261) (utilizing (he 
definition (11241) 1. 
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Moving on to the second term, similarly as in the analysis leading to (l87l) 


BdQ)=_ _E 

Q- fw{Q)<—a+fviQ) 

pj 1^ |E{A^(Q|y) > 1} < •l{A^(Q|y) > l}|n 

\Q¥=Q- fw{Q)>—oo 

|l < •l{iV(Q|y) > l}|j (B.15) 

= Cl(Q). (B.16) 

Q- fw{Q)<—a+fviQ) 

We now split the analysis into three cases: 

Cases 1 and 2: Assume 0 < I{Q) < R. An analysis similar to cases 1 and 2 in the proof of Theorem 0 shows 
that 

Cl(Q) = I {T(Qy; -a + fv{Q) - fw{Q), Qw) > i?} I {-a + fv(Q) - fw{Q) > o} . (B.17) 

Case 3: Assume that I(Q) > R. An analysis similar to case 3 in the proof of Theorem [6] shows that the inner 
prohahility in (IB.16I ) is exponentially equal to 


Cl(Q) = I {T(gy; -a + fviQ) - fwiQ), Qw) > i?} I {-a + MQ) - fw{Q) > o} e-<BQ)-R), (b.i 8 ) 

Returning to (IB. 161 ) we obtain that BdQ) is exponentially equal to the maximum between 

_ _ _ max _ _ l(T{QY]-a +fv{Q) - fw{Q),Qw) > R\, (B.19) 

Q'-fwiQ)<—a+fviQ),l{Q)<R,fv{Q)>a+fwiQ) ^ ^ 

and 


_max _ _l{TiQY;-a + fv{Q)-fw{Q),Qw)>R}e-<^^^^-^\ (B.20) 

Q-fw{Q)<—a+fv{Q),I{Q)>R,fv{Q)>a+fw(Q) ^ ^ 

or, more succinctly, 

B{Q) = maxi {T(gy; -a + fv(Q) - fw{Q), Qw) > (B. 21 ) 

Q 

where the maximization is over 

|Q : fw{Q) <-a + fv{Q), fv{Q) > O' + fw{Q)'^ ■ (B. 22 ) 

Upon averaging over (Xi, Y), we obtain the exponent Eb,l of (11291) (utilizing again (11241) '). and the proof of the 
FA exponent (11311 ) is proved using (IB. 101 ). 

For the MD expression, since (I)l is not necessarily the optimal detector in the Neyman-Pearson sense, we cannot 
use Proposition |4l However, due to the symmetry in TZq^l of W and V, a similar observation as in Fact |7] holds. 
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which leads directly to (1132b . The rest of the proof follows the same lines as the proof of theorem 


Appendix C 
Proof of Theorem [20I 

Proof of Theorem |2^ As in the proof of Theorem we only need to upper hound the FA prohahility as the 
MD prohahility can he easily evaluated from the FA hound, using Proposition H] It remains to derive an upper hound 
on the average FA error prohahility. We assume the ensemble of randomly selected codes of size M = \, where 
each codeword is selected independently at random, with i.i.d. components from the distribution Px- Introducing 
a parameter p > max{s, 1 — s}, we continue the bound (IA.4b as follows: 

M 1 r m 1 '’Vp 


Pr^iCnA') < 5 ] 

yeT" 

g-n{as+R) 

yey- 


Y W{y\Xm) 
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■ M 


_ g-n(os+iJ) 


.m=l 
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,m=l 

M 




.m=l 


(C.l) 

(C.2) 

(C.3) 


E EE iy‘^-“Vp(y|x„jr/'’(y|xfc) 

yGT" Lm=l k=l 

where (a) follows from (^- OiY < Ei < for V <1. Using now the fact that the codewords are selected at random, 
we obtain 


yey" 


M M 


EE lU‘^-“Vp(y|X,„)r/'’(y|Xfc) 


.m=l k=l 


(a) (MM 

< i ^^E[lU‘^-”Vp(y|X,„)r/'’(y|Xfc) 

yey" lm=lfc=l 


(C.4) 

(C.5) 


where (a) is by restricting p < 1 and using Jensen Inequality. For a given y, let us focus on the inner expectation. 
If m = k then 


E 


lU‘^-“Vp(y|X^)r/p(y|X^) 


= E 


ii=l 
n 

]JE[w'^-^'/Yy^\Xrn,i)y^^Yy^\Xrn,^) 

i=l 


i=l \x£X 


= '^sAy)- 


(C.6) 

(C.7) 

(C.8) 

(C.9) 


Otherwise, if m 7 ^ A:, then since the codewords are selected independently 


E 


lU'^-^Vp(y|x^)r/'>(y|Xfc)l = E [lu'^-“Vp(y|x„)l E fr/'’(y|Xfc) 


(C.IO) 
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= E 




U =1 


E 


Wv‘l^{yi\Xk,i) 


. 2=1 


n 

[]e [w^^-^'/^{yi\Xm,^)] E [r/^{yi\Xk, 


2 = 1 


= n E Pxix)W^^-^^/'’{yi\x) Y1 PxixW’/^iyilx) 

i=l \x£X / \x£X / 

= r.,p(y). 


So, the double inner summand in (IC.51) is bounded as 

{ MM 'sP 

EE E [iy'^-“V.(y|x^)r/'’(y|Xfc)] \ = {xWsAy) + m{m - i)r,,p(y)}'' 

m=l k=l } 

<2^max{M^M/f,^(y),M2^r^,^(y)} 

using {c + dY < [2 max{c, d}]^ for any c,d^0- Thus, we may continue the bound of (IC.51) as 

'^{Cn,4>') < e“”(“^+^)2^max X AP'r^py) 

[yey™ yey" 

The first term in the above maximization is given by 

n / ^ 

^.n{as-(p-l)R-^) lllY,Pxix)W^^-‘^/Yyi\x)r^'’iyi\x) 

yeyn j=l \xGX / 


^^-n{as-{p-X)R-^) ^ ^ Px{x)W^^-^^l Yy\x)V‘P {y\ 

i=iy^y \xgx 


X 


n{as-ip-l)R-^YY) 




'/P 


y&y \xgX 


= exp 


—n • ( as + Eq{s, p) — {p — 1)22 — 


plog2 


n 


(C.ll) 

(C.12) 

(C.13) 

(C.14) 


(C.15) 

(C.16) 


(C.17) 


(Cl 8) 

(C.19) 

(C.20) 

(C.21) 


where Cg(s,p) was defined in (I138I) . In a similar manner, fhe second term in the maximization is given by 

n / \ P / \ P 

e n / > II I > i^x(x)W "‘(yi\x)\ I > rx[x)V "'(ypx) \ {C.21) 


^.n{c.s-[2p-l)R-^) ^ J] ^ Px{x)W^^-‘^P{yi\x) ^ Px{x)V^'^ [ViY) 

ygyn i=l \xeX / \x£X ) 

< g-n(„.-(2p-i)H-^) n Px{x)W^^~^'/>’iyi\x)) ( Y Cx(x)r/'’(yi|x)') 

yGy"j=l VxGA’ / \x&X J 

E f E Px{x)W^^-‘^P{y\x)) (Y Pxix)V^/Yy\x) 

yey \xex / Va;Gn^ / 

plog2' 


_ -n{as-{ 2 p-l)R- ‘YY) 


= exp 


—n • I as + Cq (s, p) — {2p — 1 )R — 


n 


(C.23) 

(C.24) 

(C.25) 


where Eq{s,p) was defined in (I139I ). Definition (11401 ) fhen implies fhe achievability in (I141I) . 
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Appendix D 
Proof of Theorem [2T] 

Proof of Theorem |27} Let us begin with the FA probability. We start again from the bound (IA.4b and restrict 

s < 1 
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yey" Lm=l 
M M 
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S'-““sEEE *(y|xm)f^*(y|xfc) 

m=l k=l yey" 

where (a) follows from < > for Z/' < 1. Let us denote the random variable 
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- E E f^'"^(y|X^)'F^(y|X,) 

fc=iyGT" 


(D.l) 

(D.2) 


(D.3) 


over a random choice of codewords from i.i.d. distribution Px- Introducing a parameter p > 1, for any given 
B > 0, v/e may use the classical variation of the Markov inequality, as e.g. in ifTTl Eqs. (96)-(98)], 
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(D.7) 


(D. 8 ) 


where (a) follows from Jensen inequality, and we have used the definitions of rs^p(y) and p(y) from (1C. 14b 
and (IC.9b . Now, as 


yey" yey"j=i \xex 
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xGX y£y 
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and 


ygy" yGy"j=i \xex J \xex / 


X X Pxix)W^-^iy\x) X 
yey \xex J yxeA” 


(D.ll) 


(D.12) 


then using the definition of E'^{s) and E'^{s) in (11431) and (11441) . respectively, as well as 

Exis,p,a,Px) = min |i£’^(s), -E”{s) - , 

[P P J 

we get that (lD.8b is 

p (Em > B) < ■ exp [-n ■ Ex(s, p, a)] . 


For any given 5 > 0 let us choose 


we obtain 


B* = e"'’/^4^exp [—n ■ pFx(s, p,a)] 


¥{Zm>B*)<]^e-"'/^r 

So, if we expurgate ^ of the had codewords in a randomly chosen codehook, then 


(D.13) 


(D.14) 


(D.15) 


(D.16) 
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IP \J {Zm>B*} \ < 




(D.17) 


\m=l 


where the prohahility is over the random codehooks (note also that this expurgation only causes the sum over k in 
(1D.31) to decrease). Indeed, to see this, define as the set of ‘had’ codes which have {Zm > B*} for more than 

nd 

half of the codewords. Assume hy contradiction, that the prohahility of a ‘had’ code is larger than e . Hence, 
from the symmetry of the codewords 


' (Zm >B*) = Y^ (Cn) I {Zm > B*} 


Cn m=l 

> E ‘’(^>5 

c„e£„ 


(D.18) 

(D.19) 

(D.20) 

(D.21) 


which contradicts (lD.16b . Namely, if we expurgate i of the had codewords of each codehook, then 


P,A(Cn, 00 < exp [-n • (R, a, Px,W, V) - d)] 


(D.22) 
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for all sufficiently large n, with probability tending exponentially fast to 1 (over the random ensemble). Then, 
Proposition |4] implies that also 

</>') < exp [-n • {R, a, Px, W,V)-a- 5)] . (D.23) 

Thus, one can find a single sequence of codebooks, of size larger than ^ which simultaneously achieves both 
upper bounds above. ■ 


Appendix E 

Simplified Expressions for BSC 


In Subsection IV-AI (respectively, IV-CI ). the exponents WT\ and (l52l) (respectively, (11261 ) and (11291 )) are given as 
minimization problems over the joint types Q,Q, and also over Q, via s(Qy,7) (respectively, t((5y,7)). These 
joint types are constrained to Qx = Qx — Qx = Px and Qy = Qy = Qy- To obtain simplified expressions, 
we will show that the optimal joint types are symmetric, to wit, they result from an input distributed according to 
Px which undergoes a BSC. Thus, as both the input and output distributions for such symmetric joint types are 
uniform, it is only remains to optimize over the crossover probabilities q, q, q. 

To prove the above claim, we introduce some new notation of previously defined quantities, but specified for the 
binary symmetric case. Eor q,qi,q 2 S [0,1], the binary normalized log likelihood ratio is defined as 


/t^,B(9) =-log 




n 


= log(l -w)- qp^, 


(E.l) 

(E.2) 


where = log the binary entropy is denoted by 

K{q) = -qlogq- (1 - q) log(l - q), 


(E.3) 


and the binary information divergence is denoted by 

DB{qi\\q 2 ) = qilog— + (1 - qi)log^ -(E.4) 

92 (f - 92) 

Eor a given type Q, let us define the average crossover probability 

q{Q) — 2 [Qy\x{^\^) + Qy|x(l|0)]) (E-5) 

and let Q be a set of joint types, for which the inclusion of (5 in Q depends on Q only via q{Q). It is easy to 
verify the following facts: 

1) The information divergence satisfies 

mm. D{Qy\xW\Px) = min C»b(9||w^)- (E.6) 

Qy\x&Q 0<(J<1 
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from the convexity of the information divergence in Qy\x symmetry of Px and W. 

2 ) The normalized log likelihood ratio fw{Q) depends on Q only via q{Q), and so 

fw{Q) = ^ Q{x,y)logW{y\x) (E.7) 

x£X,y^y 

= (1 - q{Q)) log(l -w) + q{Q) log{w) (E.8) 

= U,B{qm- (E.9) 

3) Eet L{q) he a linear function of q. Then 


max min {I(Q) P L [q{Q)]} = min {log 2 -/iB(g') + T(g')} . (E.IO) 

Qy Q:Qy=Qy 0<g<l 

To see this, note that I{Q) is concave in Qy (as the input distribution to the reverse channel Qx\y)^ 

L [q{Q)] is linear in Qy. So, 


min_ {I{Q) + L{q{Q))} = min \l{Qy x Qx\y) 
Q:Qy=Qy '' 


+ L 


q{Qy X Qx\y) } 


(E.ll) 


is a pointwise minimum of concave functions in Qy and thus a concave function. Moreover, it is symmetric 
in the sense that if Qy{0) is replaced with Qy{l), and (5j\:|y('|0) replaced with Qx\y{'\^)’ then the same 
value for the ohjective function is obtained. This fact along with convexity implies that the maximizing Qy 
is uniform. Since Px is also uniform, the minimizing Qx\y is also symmetric. 

We are now ready to provide the various bounds for detection of two BSCs under uniform input using the facts 
above. 


A. Exact Random Coding Exponents 

Eet us begin with Ea of WT\ . Assume by contradiction that the optimal Q* is not symmetric. Eact 1 implies 
that if the inputs are permuted, (5*('|0) -f-)- Q*{-\1) and this joint type is averaged with Q* with weight | to result 
a new type Q** then 

D{Q*^^^\\W\Px) < D{Q*y^^\\W\Px). (E.12) 

Also, Eact 2 implies that Q** G J\. In addition, since the function J{Q) = —a + fv{Q) — fwiQ) is linear in 
Q and depends on Q only via q{Q), then Remark [TT] and Eact 3 above implies that Q** € J2- Consequently, the 
optimal Q* must be symmetric, and the minimization problem involved in computing Ea (1471 ) may be reduced to 
optimizing only over crossover probabilities, rather than joint types. The result is as follows. Eet = log 
Then, 


Ji,B = {q : fw,B(q) Pa- fv,B(q) < 0} 

= {q : q{Pv - Pw) <-aP 7 ^„} 


(E.13) 

(E.14) 
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and 


(E.15) 

(E.16) 


J 2 ,B = \ q ■ max min {log2 - h^iq) + A [-a + fv,B{q) - fw,Biq)]} > R ^ 

0<A<10<g<l J 

= \q : max {log2 - K{q*) + A [-a + fv,B{q) - fw^Q*)]} > R 
where (a) is obtained by simple differentiation and q* = ■ Then, 

Ea,b - min DsiqWw). 

Eet us now inspect Eb of (l52l) . The same reasoning as above shows that the optimal {Q,Q) must be symmetric. 
Now, let 


(E.17) 


^ 2 ,b = {{q, q) ■■ q{pv - Pw) <-a + 7 ^^} 

^3,b = {(?,q) : fvA^) >a + fwA^) - [7?- log 2 + /ib®] + } 


(E.18) 

(E.19) 


and 


/C 4 ,b = I {q,q) : max min {log 2 - hj,{q) + A [-a + fv,B{q) - fw,B{q) + [i? - log2 + /iB(g)]+] } > 


(E.20) 


= I {q,q) : rnax {log 2 - hsiq*) + A [-a + fv,Biq) - fw,Biq*) + [7? - log2 + /iB(g)]+] } > -R) (E-21) 

I J 


we obtain 


Eb,b- min DB(g||w^) + [log2 -/iB(g) - i?] , 
(q,q)enU2lCi.B 


(E.22) 


The most difficult optimization problem to solve, namely Eb,b, is only two-dimensional. 


B. Expurgated Exponents 

The Chernoff distance (II191) for a pair of BSCs with crossover probabilities w and v is 


ds{x,x) = 


(E.23) 


— log [(1 — ^ + —^] , x^x 

— log [(1 — r(;)^(l — , x = x 

Now, let us analyze (11211) . Since Px is uniform, then the definition of the set E in (1120b implies that Pxx 
symmetric. So, 

E^^{R,a,Px^W,V) = max min {as + (1 — q)ds(l) 0) + gds(0,0) + log 2 —/iB(g) — i?} (E.24) 

0<s<l q- log 2-hB{q)<R 

= max {as + (1 - q*)ds{l, 0) + g*ds(0,0) + log 2 - hji{q*) - R} 

0<s<l 


(E.25) 
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where 


exp 


Q = 


i (4(1,0) -4(0,0)) 


1 + exp 


i (4(1,0)-4(0,0)) 


and /r > 1 is either chosen to satisfy 4(9*) = log 2 — R or fj, = 1. 


(E.26) 


C. Exact Random Coding Exponents of Simplified Detectors/Decoders 

As was previously mentioned, the simplified detector/decoder for high rates is useless in this case. For the 
simplified defecfor/decoder for low rales, we may use Ihe same reasoning as for Ihe optimal deleclor/decoder. Lei 
i7i,l,b = i7i,b “intl 


272 ,L,B = \ q ■ max min {log 2 - 4(g) + A [-a + fvfi{q) - 4,b(9)]} > R 7 

A>0 0<(j<l J 

= jg : max {log 2 - 4(g*) + X[-a + fy^siq) - 4,b(9*)]} > -r| 


where q* = 


( 1 — 


:■ Then, 


Lei /C 2 ,l,b — 7 C2 ,b and 


Ea, L,B = min DsiqWw). 

gG4=iJi,L.B 


7 C3 ,l,b = {(g, q) ■ 4 ,b(9 ) > a + /,^,b(9 )} , 


and 


Ihen 


7^4,L,B = S {q,q) ■ max min {log 2 - 4(g) + A [-a + 4,B(g) - /io,b(9)]} > R 

A>0 0<g<l 

= \ ( 9 , 9 ) : max {log 2 - 4 ( 9 *) + X[-a + fy^^{q) - fwAq*)]} > > 


Eb, L.B = _ min Ds{q\\w) + [log 2 - 4(g) - i?]_ 

iq,q)enU2X^i,L,B 
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